Search CORE

17 research outputs found

Quantification of the variation in percentage identity for protein sequence alignments

Author: Barton Geoffrey J
Raghava GPS
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Percentage Identity (PID) is frequently quoted in discussion of sequence alignments since it appears simple and easy to understand. However, although there are several different ways to calculate percentage identity and each may yield a different result for the same alignment, the method of calculation is rarely reported. Accordingly, quantification of the variation in PID caused by the different calculations would help in interpreting PID values in the literature. In this study, the variation in PID was quantified systematically on a reference set of 1028 alignments generated by comparison of the protein three-dimensional structures. Since the alignment algorithm may also affect the range of PID, this study also considered the effect of algorithm, and the combination of algorithm and PID method. RESULTS: The maximum variation in PID due to the calculation method was 11.5% while the effect of alignment algorithm on PID was up to 14.6% across three popular alignment methods. The combined effect of alignment algorithm and PID calculation gave a variation of up to 22% on the test data, with an average of 5.3% ± 2.8% for sequence pairs with < 30% identity. In order to see which PID method was most highly correlated with structural similarity, four different PID calculations were compared to similarity scores (Sc) from the comparison of the corresponding protein three-dimensional structures. The highest correlation coefficient for a PID calculation was 0.80. In contrast, the more sophisticated Z-score calculated by reference to randomized sequences gave a correlation coefficient of 0.84. CONCLUSION: Although it is well known amongst expert sequence analysts that PID is a poor score for discriminating between protein sequences, the apparent simplicity of the percentage identity score encourages its widespread use in establishing cutoffs for structural similarity. This paper illustrates that not only is PID a poor measure of sequence similarity when compared to the Z-score, but that there is also a large uncertainty in reported PID values. Since better alternatives to PID exist to quantify sequence similarity, these should be quoted where possible in preference to PID. The findings presented here should prove helpful to those new to sequence analysis, and in warning those who seek to interpret the value of a PID reported in the literature

Directory of Open Access Journals

PubMed Central

University of Dundee Online Publications

Analysis and prediction of antibacterial peptides

Author: Lata Sneh
Raghava GPS
Sharma BK
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Antibacterial peptides are important components of the innate immune system, used by the host to protect itself from different types of pathogenic bacteria. Over the last few decades, the search for new drugs and drug targets has prompted an interest in these antibacterial peptides. We analyzed 486 antibacterial peptides, obtained from antimicrobial peptide database APD, in order to understand the preference of amino acid residues at specific positions in these peptides. Results It was observed that certain types of residues are preferred over others in antibacterial peptides, particularly at the N and C terminus. These observations encouraged us to develop a method for predicting antibacterial peptides in proteins from their amino acid sequence. First, the N-terminal residues were used for predicting antibacterial peptides using Artificial Neural Network (ANN), Quantitative Matrices (QM) and Support Vector Machine (SVM), which resulted in an accuracy of 83.63%, 84.78% and 87.85%, respectively. Then, the C-terminal residues were used for developing prediction methods, which resulted in an accuracy of 77.34%, 82.03% and 85.16% using ANN, QM and SVM, respectively. Finally, ANN, QM and SVM models were developed using N and C terminal residues, which achieved an accuracy of 88.17%, 90.37% and 92.11%, respectively. All the models developed in this study were evaluated using five-fold cross validation technique. These models were also tested on an independent or blind dataset. Conclusion Among antibacterial peptides, there is preference for certain residues at N and C termini, which helps to demarcate them from non-antibacterial peptides. Both the termini play a crucial role in imparting the antibacterial property to these peptides. Among the methods developed, SVM shows the best performance in predicting antibacterial peptides followed by QM and ANN, in that order. AntiBP (Antibacterial peptides) will help in discovering efficacious antibacterial peptides, which we hope will prove to be a boon to combat the dreadful antibiotic resistant bacteria. A user friendly web server has also been developed to help the biological community, which is accessible at <url>http://www.imtech.res.in/raghava/antibp/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A comparison of common programming languages used in bioinformatics

Author: A Conesa
AB Clegg
D Butt
D Posada
EM Zdobnov
GPS Raghava
H Mangalam
L Prechelt
LJ McGuffin
Mathieu Fourment
Michael R Gillings
MK Kuhner
N Saitou
RA Irizarry
S Guindon
SF Altschul
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python. Results Implementations in C and C++ were fastest and used the least memory. Programs in these languages generally contained more lines of code. Java and C# appeared to be a compromise between the flexibility of Perl and Python and the fast performance of C and C++. The relative performance of the tested languages did not change from Windows to Linux and no clear evidence of a faster operating system was found. Source code and additional information are available from <url>http://www.bioinformatics.org/benchmark/</url> Conclusion This benchmark provides a comparison of six commonly used programming languages under two different operating systems. The overall comparison shows that a developer should choose an appropriate language carefully, taking into account the performance expected and the library availability for each language.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Macquarie University ResearchOnline

Identification of the PDI-Family Member ERp90 as an Interaction Partner of ERFAD

Author: A Pirneskoski
B Rost
B Wilkinson
C Appenzeller-Herzog
C Appenzeller-Herzog
C Appenzeller-Herzog
C Fagioli
C Hirsch
Christian Appenzeller-Herzog
DT Jones
EM Frickel
F Hatahet
F Hatahet
G Dong
G Kozlov
G Tian
GPS Raghava
GZ Lederkremer
Henning G. Hansen
J Haugstetter
J Hoseki
J Loureiro
J Riemer
Jan Riemer
JC Christianson
Lars Ellgaard
Linda Johansson
M Hagiwara
P Klappa
R Ushioda
S Shim
SS Vembar
Sue Cotterill
T Anelli
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

In the endoplasmic reticulum (ER), members of the protein disulfide isomerase (PDI) family perform critical functions during protein maturation. Herein, we identify the previously uncharacterized PDI-family member ERp90. In cultured human cells, we find ERp90 to be a soluble ER-luminal glycoprotein that comprises five potential thioredoxin (Trx)-like domains. Mature ERp90 contains 10 cysteine residues, of which at least some form intramolecular disulfides. While none of the Trx domains contain a canonical Cys-Xaa-Xaa-Cys active-site motif, other conserved cysteines could endow the protein with redox activity. Importantly, we show that ERp90 co-immunoprecipitates with ERFAD, a flavoprotein involved in ER-associated degradation (ERAD), through what is most likely a direct interaction. We propose that the function of ERp90 is related to substrate recruitment or delivery to the ERAD retrotranslocation machinery by ERFAD

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

edoc

PubMed Central

Copenhagen University Research Information System

Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix

Author: A Bateman
A Kuzniar
A Nekrutenko
Andrew Ndhlovu
B Rost
C Angermüller
C Chothia
CA Orengo
CR Madden
DA Benson
F Sievers
FS-M Pais
G Celniker
GPS Raghava
J Reese
J Sadri
JD Thompson
JD Thompson
JD Thompson
K Katoh
K Yamada
L Goodstadt
M Anisimova
M Kimura
MC Kew
MN Price
MR Aniba
O Gotoh
PA Nuin
Pierre M. Durand
PM Durand
R Kolodny
R Nielsen
RF Doolittle
S Guindon
S Henikoff
S Murakami
SB Needleman
Scott Hazelhurst
SF Altschul
SF Altschul
SL Kosakovsky Pond
SL Pond
T Pupko
TJ Liang
Z Yang
Z Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Large expert-curated database for benchmarking document similarity detection in biomedical literature search

Author: Aanei CM
Abid MB
Abramowitz MK
Abu-Zaid A
Afnan M
Agarabi C
Ahmad R
Aizat WM
Al-Farha AA
Al-Lawama M
Alanio A
Alaux C
Albiol J
Albrecht DR
Albuquerque LG
Alimba CG
Allardyce J
Almeida GMF
Alonso-Caneiro D
Alper OM
Amer SEDR
Amiya E
Ammerman BA
Amorim RM
An Q
Andersen SU
Aplin JD
Argyropoulos C
Armitage C
Ascher DB
Ashry M
Asmann YW
Assaeed AM
Atack JM
Atanasov AG
Atchison DA
Atkins GJ
Atlas L
Avery SV
Avillach P
Baade PD
Backman L
Badie C
Bae T
Baier D
Baker CI
Bakkach J
Baldi A
Ball E
Bannon R
Bansal A
Bardot O
Barnett AG
Barraud P
Basharat Z
Basner M
Batra J
Baumert P
Bazanova OM
Beale A
Beck CR
Becker D
Beddoe T
Bell ML
Benezeth Y
Bengtsson-Palme J
Berbesque C
Berezikov E
Bergsland N
Berners-Price S
Bernhardt P
Berrevoet F
Berry E
Berthold M
Bessa TB
Beyene TJ
Biedermann PHW
Bijleveld E
Billington C
Birch J
Bittner F
Bitzer M
Blakely RD
Blanck O
Blaskovich MAT
Bleackley M
Blombach F
Blum R
Boehme KA
Boelaert M
Bogdanos D
Bonvin AMJJ
Bosch C
Bosch O
Boudreau SA
Bourgoin T
Bourke E
Bouvard D
Boykin LM
Bradley G
Bradshaw W
Bramoweth AD
Brand T
Braubach O
Braun D
Braun RJ
Brenneisen P
Bridges KM
Brown JAL
Brown P
Browngardt C
Brownlie J
Bruhl A
Bukowy-Bieryllo Z
Bull JA
Burt A
Bush SJ
Butler LM
Byrareddy SN
Byrne HJ
Cabantous S
Cai Y
Calatayud S
Campana LG
Campbell M
Candal E
Cao Z
Cao Z
Cardoso P
Carlson K
Carter D
Cascella M
Casillas S
Castelvetro V
Caswell PT
Catry T
Cavalli G
Cernava T
Cerovsky V
Chacko G
Chagoyen M
Chakraborty S
Chan SS
Chandrasekaran AR
Chatzitheochari S
Chavez-Fumagalli MA
Chen B
Chen C-E
Chen C-S
Chen DF
Chen H
Chen H
Chen J-T
Chen X
Chen Y
Cheng C
Cheng J
Cheng S
Cheung JTK
Chinapaw M
Chinopoulos C
Cho WCS
Chong L
Chowdhury D
Chung H-J
Chwalibog A
Ciresi A
Cobine PA
Cockcroft S
Coelho LP
Colella V
Conesa A
Conway A
Cook PA
Cooper DN
Cooper J
Coqueret O
Corea EM
Cosacak MI
Costa BM
Costa E
Costa VD
Coupland C
Crawford SY
Cruz AD
Cui H
Cui Q
Cuiv PO
Culver DC
Cuypers M
Cyr N
D'Angiulli A
Dahms TES
Dai Z
Daigle F
Dalgleish R
Dalrymple BP
Danchin A
Danielsen HE
Darras S
Daulatabad SV
Davidson SM
Day DA
de Keersmaecker K
de Leeuw F-E
Dean LT
Debrabant B
Degirmenci V
del Tredici AL
Delahay RM
Demaison L
Denzel MS
Deschodt M
Devkota HP
Devriendt K
Dhariwal R
Diao J
Ding J
Dings RPM
Diouf B
Dixon R
Dlamini SV
Dogan Y
Domingues HS
Dong XC
Donner CF
Dono M
Doxey AC
Dressick W
Drevon CA
Duan H
Ducho C
Ducommun B
Dudley KJ
Dufies M
Duijf PHG
Dumaz N
Dwarakanath BS
Ebell MH
Echeverria N
Ecke T
Eckweiler D
Eerola T
Effiong A
Ehret F
Eisenhardt S
Eixarch E
El-Adawy H
El-Esawi MA
Elkum N
Emmrich JV
Engel MS
Engel N
Epp T
Erickson TB
Esfahlani SS
Eskelinen E-L
Eskew EA
Esnakul AK
Eustace AJ
Evangelou E
Fairhead M
Falk S
Fallah M
Falter-Wagner CM
Fan X
Farber DB
Faville MJ
Feghali KA
Fejzo MS
Fernandez-Triana J
Festa F
Feteira A
Feyerabend F
Fierz W
Filipp FV
Fiona .
Flegel WA
Flood-Page P
Florio T
Forano E
Forsayeth J
Fox SA
Franks SJ
Frentiu FD
Friebe M
Frilander MJ
Fu X
Fujita S
Furuta S
Fuss J
Gabrielsen M
Gajda M
Galea I
Galluzzi L
Gani F
Ganpule AP
Gao J
Garcia-Alix A
Gatchell M
Gaullier G
Gedye K
Gelfer Y
Ghelardi E
Gill MR
Gilliham M
Giordano M
Giunta C
Gladue DP
Gleeson PA
Gloyn L
Gnasso A
Goarant C
Gobet A
Goggs R
Gong H
Gonzalezlez-Prendes R
Goodin A
Goodyear CS
Gora D
Gough MJ
Govender P
Govinden U
Goyal R
Graham EB
Graham KE
Grande-Perez A
Graves PM
Greene G
Greenwald NF
Greidanus H
Greiff V
Grice D
Grimm DG
Groen EJN
Gruber J
Grunau C
Grundle DS
Gruneberg P
Grybos M
Guisado JL
Gumede N
Gumulya Y
Guo Y
Gurevich VV
Gurney-Champion OJ
Gusev O
Gutierrez-Sacristan A
Habes M
Hacker E
Hage SR
Hagen G
Hahn S
Haller DM
Hammerschmidt S
Han H
Han J
Han Q
Han R
Handfield M
Hanson J
Haore G
Hapuarachchi HC
Harder T
Hardingham JE
Harrison P
Hartmann MD
Harvey DJ
Haston S
Heck M
Heers M
Heffler E
Heinrich M
Helantera H
Herbelet S
Hew KF
Higginbottom DB
Higuchi Y
Hilton R
Hiroi N
Hobbs E
Hodzic E
Hoenner X
Hojsgaard D
Hone A
Hongoh Y
Honjo K
Horbar J
Hori H
Hu G
Hu P
Huber HP
Huber M
Hueso LE
Huirne J
Hurt L
Huttner FJ
Idborg H
Ide K
Ikeo K
Ikonomopoulou MP
Ingley E
Jakeman PM
Janga SC
Janzen T
Jayaraman J
Jeltsch A
Jensen A
Jeurissen P
Jia H
Jia H
Jia S
Jiang F
Jiang J
Jiang X
Jibb LA
Jin Y
Jo D
Johnson AM
Johnson DM
Johnston M
Jongen S
Jonscher KR
Jorens PG
Jorgensen JOL
Josse C
Joubert JW
Jung S-H
Junior AM
Jurman G
Kabra D
Kahan T
Kaiser S
Kamagata K
Kamboj SK
Kamiya H
Kane NC
Kang Y-K
Karamanos Y
Karmakar C
Karp NA
Kasian O
Kauppila JH
Kaye LK
Kelly R
Kelly S
Kenna R
Kennedy J
Kersten B
Khalaf RA
Khalid JM
Khan MM
Khatlani T
Khider T
Kijanka GS
Kim Y-M
King SRB
Kinyanjui T
Kish JK
Klempnauer K-H
Kleppe A
Klump H
Kluz T
Knox P
Kobayashi T
Kobold S
Koch K-W
Kohanbash G
Kohls G
Kohonen-Corish MRJ
Koleva-Kolarova RG
Kong X
Konkle-Parker D
Korpela KM
Kostrikis LG
Kraiczy P
Kratz H
Krause G
Krebsbach PH
Kristensen SR
Kristiansson E
Kueberuwa G
Kugler J-M
Kulkarni A
Kumar G
Kumar N
Kumar N
Kumari P
Kunimatsu A
Kurdak H
Kurgan L
Kurniawan NA
Kwon YD
Lachat C
Lacy-Colson J
Lagisz M
Lai HM
Laky B
Lalaouna D
Lammerding J
Lange M
Larrosa M
Laslett AL
Latif A
Lau CL
Lauschke VM
LeClair EE
Lee K-W
Lee M-S
Lee M-Y
Lee S
Li B
Li G
Li J
Li J
Li J
Li Z
Liang D
Liang S
Lidbury BA
Lieb K
Liehr T
Liew AWC
Lim CJ
Lim YY
Lin MZ
Lindsey ML
Line P-D
Liu D
Liu E
Liu F
Liu F
Liu H
Liu H
Liu S
Liu X
Liu Y-P
Lloyd VK
Lo T-W
Locci E
Loft ND
Loidl J
Lopez-Escamez JA
Lopez-Ruiz FJ
Lorenzen J
Lorkowski S
Lovell NH
Lu H
Lu J-J
Lu Q
Lu W
Lu Z
Luengo GS
Lund BA
Lundh L-G
Lussier AA
Luu AM
Lynch I
Lysy PA
Ma C
Ma L
Ma L
Ma L
Ma R
Ma W
Mabb A
Mack HG
Mackey DA
Mahavadi P
Mahdavi SR
Maher P
Maher T
Maibach EW
Maity SN
Malgrange B
Mamoulakis C
Mangoni AA
Manke T
Manstead ASR
Mantalaris A
Marchbank KJ
Marinello F
Marsal J
Marschalek R
Marschall H-U
Martin CS
Martin FL
Martinez-Raga J
Martinez-Salas E
Martis E
Marzocchi U
Mather DE
Mathieu D
Matsui Y
Maza E
McCrum C
McCutcheon JE
McGarrigle CA
Mckay GJ
McMillan B
McMillan N
Meads C
Medina L
Merrick BA
Meseko C
Metzger DW
Meule A
Meunier FA
Michaelis M
Micheau O
Miele AE
Mier P
Mihara H
Min R
Mintz EM
Miotla P
Mitchell KM
Mizukami T
Moal I
Moalic Y
Mohapatra DP
Molari M
Molleman L
Mondal SR
Montagutelli X
Monteiro A
Montes M
Moore MD
Moran JV
Morcillo E
Morozov SY
Mort M
Moss WN
Moultos OA
Moyer R
Mukherjee M
Murai N
Murphy DJ
Murphy SK
Murray SA
Muth T
Naganawa S
Nagler K
Nakayama K
Nammi S
Nandakumar KS
Narayan E
Nasios G
Natoli RM
Navaratnarajah .
Neumann P-A
Ng G
Nguyen F
Nicol C
Nicoletti R
Nie J
Nie Y
Niehof M
Niemeyer F
Nilsen EB
Nilsson H
Nixon B
Nobile CJ
Norris AD
Nwaiwu O
O'Mahony M
O'Toole R
Ogami K
Ohgami RS
Ohlsson S
Ohtomo T
Olatunbosun O
Oldenmenger WH
Olofsson P
Olumayede E
Orme MW
Ortiz A
Oster H
Ostrikov K
Otto S
Ou J
Outeiro TF
Ouyang S
Paganoni S
Page A
Pallebage-Gamarallage M
Palm C
Palma J-A
Pan Z
Panthee S
Paradies Y
Parchi P
Parsons JR
Parsons MH
Parsons N
Pascal P
Paterson R
Paul E
Pearce SP
Pearson JA
Peckham M
Pedemonte N
Peifer M
Pelkonen T
Pelleri MC
Pellizzon MA
Peng Y
Perco P
Pereira JL
Peres MA
Petrelli M
Pheko M
Pichugin A
Pinto CJC
Pinto IM
Pinto KA
Piotrowski M
Piovesan A
Plevris JN
Pluess M
Podolsky IM
Pollesello P
Polz M
Ponti G
Popoola SI
Porcelli P
Portilla M
Portillo MC
Pourret O
Prajapati AS
Pranata R
Prescott J
Prieto D
Prince M
Pritchard AL
Pusch S
Qi D
Qi X
Quinn GP
Quinn TJ
Raghava GPS
Rahimi F
Rahman MS
Raikou VD
Ramula S
Ranft A
Rappsilber J
Reddan T
Rehfeldt F
Reiling JH
Remacle C
Reschke CR
Rezaei M
Rhodes J
Riddick EW
Ritter U
Riva G
Roach NW
Roberts DD
Roberts NJ
Robles G
Rodrigues T
Rodriguez C
Roislien J
Roobol MJ
Ross K
Ross SA
Rotge J-Y
Rowe AD
Rowe JA
Ruepp A
Rust P
Saad S
Sabnis SC
Sack GH
Saggar M
Saito Y
Salama MF
Sallmon H
Santos M
Saudemont A
Sava G
Schrading S
Schramm A
Schreiber M
Schuele B
Schuler S
Schulte LN
Schuon RA
Schymkowitz J
Sczyrba A
Seib KL
Senghore T
Seow E
Sergeant K
Shabalin IG
Shahid S
Shalchyan V
Shen J
Shi H-P
Shimada T
Shin J-S
Shortt C
Siebers R
Sillanpaa E
Silveyra P
Skinner D
Small I
Smeets PAM
Smith SS
So P-W
Solano F
Sonenshine DE
Song H
Song J
Sorzano CO
Southall T
Speakman JR
Srinivasan MV
St Hilaire C
Stabile LP
Staege MS
Stasiak A
Steadman KJ
Stein N
Stella A
Stephens AW
Stevanovic D
Stewart CJ
Stewart DI
Stine K
Storlazzi C
Stoynova NV
Strzalka W
Suarez OM
Subhash S
Sukocheva O
Sultana T
Sumant AV
Summers MJ
Sun G
Sydes M
Tacon P
Tamaian R
Tan A-C
Tan E-C
Tan K-H
Tanaka K
Tang H
Tanino Y
Targett-Adams P
Tayebi M
Tayyem R
Tebbe CC
Telfer EE
Tempel W
Teodorczyk-Injeyan JA
Terrier O
Testoni I
Thijs G
Thorne S
Thrift AG
Tiffon C
Tinnefeld P
Tjahjono DH
Tofani M
Tolle F
Torga G
Toth E
Tressoldi P
Troder SE
Tsapas A
Tsirigotis K
Turak A
Tuttle N
Tzotzos G
Uchendu F
Udo EE
Uhle F
Utsumi T
Uversky VN
Vaidyanathan S
Vaillant M
Valsesia A
Van de Mortel T
Van den Bos W
van Meerten T
van Nieuwerburgh F
van Raaij MJ
van Ruitenbeek J
Vandenbroucke RE
Vanneste S
Veiga FH
Vendrell M
Verloh N
Vesk PA
Vickers P
Victor VM
Villemur R
Villet MH
Vindin H
Viveiros M
Vohl M-C
Voolstra CR
Vorholt JA
Voskarides K
Voutchkova DD
Vuillemin A
Wakelin S
Waldron L
Walsh LJ
Wang AY
Wang F
Wang Y
Watanabe Y
Weigert A
Weinstock C
Wen J-C
Werner GDA
Werten S
Westermair AL
Wham C
White EP
Widera D
Wiener J
Wilharm G
Wilkinson S
Williams R
Willmann R
Wilson C
Wirth B
Wojan TR
Woldesemayat AA
Wolff M
Wong A
Wong BM
Wu T-W
Wuerbel H
Xia W
Xiao X
Xu D
Xu H
Xu J
Xu J
Xu JW
Xue B
Xue Y
Yadollahpour A
Yalcin S
Yamato M
Yan H
Yang E-C
Yang H
Yang L
Yang S
Yang SY
Yang W
Yang Y
Ye Y
Ye Z-Q
Yeung AWK
Yin C-C
Yli-Kauhaluoma J
Yoneyama H
Yu Y
Yuan G-C
Yuh C-H
Zabetakis I
Zaccolo M
Zaucha J
Zeng C
Zeng E
Zevnik B
Zhang C
Zhang C
Zhang J
Zhang L
Zhang L
Zhang X
Zhang Y
Zhang Y
Zhang Z
Zhang Z
Zhang Z-Y
Zhao X
Zhao Y
Zhou K
Zhou M
Zhu S
Ziegler A
Zinke K
Zuberbier T
Publication venue: OXFORD UNIV PRESS
Publication date: 29/10/2019
Field of study

Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency–Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research

UCL Discovery

DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment

Author: A Godzik
A Kloczkowski
AS Konagurthu
C Kemena
D Kihara
DA Morrison
DR Noguera
DT Jones
E Bindewald
Erik S. Wright
ES Wright
ES Wright
ES Wright
F Armougom
F Morcos
F Sievers
F Sievers
G Blackshields
G Jordan
G Tan
GE Crooks
GPS Raghava
H Zhou
I Walle Van
J Garnier
J Jorda
J Pei
J Pei
JD Thompson
JD Thompson
JD Thompson
JG Henikoff
JM Hancock
JM Sauder
K Boyce
K Katoh
K Katoh
K Mizuguchi
M Cline
MK Kalita
MR Aniba
MS Breen
MSS Chang
P Katsonis
Q Li
R Core Team
R Kim
R Szklarczyk
RC Edgar
RC Edgar
RC Edgar
RC Edgar
RC Gentleman
RD Finn
S Iantorno
S Mirarab
S Pascarella
SF Altschul
TM Phuong
VA Simossis
W Fletcher
W Kabsch
X Deng
Y Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein

Author: Han JH
Raghava GPS
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Background: A large number of papers have been published on analysis of microarray data with particular emphasis on normalization of data, detection of differentially expressed genes, clustering of genes and regulatory network. On other hand there are only few studies on relation between expression level and composition of nucleotide/protein sequence, using expression data. There is a need to understand why particular genes/proteins express more in particular conditions. In this study, we analyze 3468 genes of Saccharomyces cerevisiae obtained from Holstege et al., ( 1998) to understand the relationship between expression level and amino acid composition. Results: We compute the correlation between expression of a gene and amino acid composition of its protein. It was observed that some residues ( like Ala, Gly, Arg and Val) have significant positive correlation ( r > 0.20) and some other residues ( Like Asp, Leu, Asn and Ser) have negative correlation ( r < - 0.15) with the expression of genes. A significant negative correlation ( r = - 0.18) was also found between length and gene expression. These observations indicate the relationship between percent composition and gene expression level. Thus, attempts have been made to develop a Support Vector Machine ( SVM) based method for predicting the expression level of genes from its protein sequence. In this method the SVM is trained with proteins whose gene expression data is known in a given condition. Then trained SVM is used to predict the gene expression of other proteins of the same organism in the same condition. A correlation coefficient r = 0.70 was obtained between predicted and experimentally determined expression of genes, which improves from r = 0.70 to 0.72 when dipeptide composition was used instead of residue composition. The method was evaluated using 5-fold cross validation test. We also demonstrate that amino acid composition information along with gene expression data can be used for improving the function classification of proteins. Conclusion: There is a correlation between gene expression and amino acid composition that can be used to predict the expression level of genes up to a certain extent. A web server based on the above strategy has been developed for calculating the correlation between amino acid composition and gene expression and prediction of expression level http:// kiwi. postech. ac. kr/ raghava/ lgepred/. This server will allow users to study the evolution from expression data.open1137sciescopu

포항공과대학교

Prediction of subcellular localization of proteins using pairwise sequence alignment and support vector machine

Author: Bang SY
Choi SJ
Kim JK
Raghava GPS
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Predicting the destination of a protein in a cell is important for annotating the function of the protein. Recent advances have allowed us to develop more accurate methods for predicting the subcellular localization of proteins. One of the most important factors for improving the accuracy of these methods is related to the introduction of new useful features for protein sequences. In this paper we present a new method for extracting appropriate features from the sequence data by computing pairwise sequence alignment scores. As a classifier, support vector machine (SVM) is used. The overall prediction accuracy evaluated by the jackknife validation technique reached 94.70% for the eukaryotic non-plant data set and 92.10% for the eukaryotic plant data set, which is the highest prediction accuracy among the methods reported so far with such data sets. Our experimental results confirm that our feature extraction method based on pairwise sequence alignment is useful for this classification problem. (c) 2006 Elsevier B.V. All rights reserved.X1121Nsciescopu

포항공과대학교

The figure depicts the sequence logo of last fifteen residues (C-terminus) of antibacterial peptides, where size of residue is proportional to its propensity

Author: BK Sharma (35797)
GPS Raghava (35798)
Sneh Lata (35796)
Publication venue
Publication date
Field of study

Copyright information:Taken from "Analysis and prediction of antibacterial peptides"http://www.biomedcentral.com/1471-2105/8/263BMC Bioinformatics 2007;8():263-263.Published online 23 Jul 2007PMCID:PMC2041956.</p

FigShare